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A Brief History of NPB 


Goal: Measure performance of modem (parallel) 

architectures running scientific apps 

Contents: 5 kernels, 3 pseudo apps (implicit CFD) 

Approach: NPB1: Paper-and-pendl specs 

NPB2: Source code implementations 
(F77/C/MPI) 

PBN: Source code implementations 
(HPF/OpenMP/Java) 


lalasay 


Kernels: 

• EP 

Random-number generator 


• IS 

Integer sort 


• CG 

Conjugate gradient 


• MG 

Multigrid method for Poission eqn 


• FT 

Spectral method (FFT) for Laplace eqn 

Pseudo 

• BT 

ADI; Block-Tridiagonal systems ! 

apps: 

• SP 

ADI; Scalar Pentadiagonal systems 


• LU 

Lower-Upper symmetric Gauss-Seidel 
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What- is tested in NPB2? 


Name 

Math 

functions 

Network 

bandwidth 

Network 

latency 

Memory 

bandwidth 

EP 


. 
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/ 


✓ 

CG 





MG 
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✓ 
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✓ 




err 
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SP 


✓ 

LU 

/ 



What is not in NPB2? 

• Dynamically changing memory accesses 1 

• Irregular memory accesses j 

. False sharing / cache coherence costs 

• System software / Grid computing middleware -j 

• Fault tolerance | 

• Wide area (public) network bandwidth/latency 


i Instructured Adaptive Mesh Refinement (UA)._ 
□Structured, static grids -* fixed-stride memory access 
□Message passing -► private address spaces 
Choice of parallelization paradigm not (very) important 

Unstructured, adaptive mesh refinement (Biswas, Oliker) 

• No solver - only refine, (re)paitition, and remap 

• Implementations: 

>MP I- SGI Origin, Cray T3E 

>OpenMP - SGI Origin 

>Multithreading - Tera (Cray) MTA 


UA (cont'd) 

Different kinds of triangular refinement 






UA (cant'd) 
Overview of PUJM 



Complete, practically useful benchmark: 

• Includes solver 

• Is 3-dime nslonal 

• Does coarsening in addition to refinement 

• Spends little time on (re)partitioning/(re)mapping 

• Is well load balanced 

• Requires no data files 

• Is compact: NPB2 apps ~ 5,000 lines (SP,BT,LU) 

it! 


UA (confer 


Overall assessment 


Paradigm 

Code 

increase 

Memory 

increase 

Scaiabiiity 

Portability 

MPI 

100% 

70% 

Medium 

High 

OpenMP 

10% 

5% 

None 

Medium 

Multi- 

threading 

2% 

7% 

High 

Low 



Flame propagation problem (Feng & Mavriplis): 

• Scalar transport eqn: T t +V VT = €&T-£ l F(T) 

• Velocity field given: linear problem 

• Rectangular domain 

• Rectangular nonconforming elements 

• Spectral elements of relatively high order (5 th ) 

• Mixed explicit/implicit time integration 

• Spatial refinement/coarsening 

• Interface ops cheap compared to overall scheme 
























NGB rent'd 


Must define boundaries because: 

• Grid concept/environment/irrfrast^^ not well defined 

• Grid has very many software/hardware components 

• Grid has complex functional hierarchy 

■ Grid is a time/organization/application dependent target 
Cannot freeze benchmark implementation because: 

■ No dear Grid environment winner 

■ No dear picture of representative Grid application 
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